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Abstract. In this paper, we compare the performance of two scenario-based 
numerical methods to solve stochastic optimal control problems: scenario trees 
and particles. The problem consists in finding strategies to control a dynamical 
system perturbed by exogenous noises so as to minimize some expected cost 
along a discrete and finite time horizon. We introduce the Mean Squared 
Error (MSE) which is the expected L^-distance between the strategy given 
by the algorithm and the optimal strategy, as a performance indicator for 
the two models. We study the behaviour of the MSE with respect to the 
number of scenarios used for discretization. The first model, widely studied in 
the Stochastic Programming community, consists in approximating the noise 
diffusion using a scenario tree representation. On a numerical example, we 
observe that the number of scenarios needed to obtain a given precision grows 
exponentially with the time horizon. In that sense, our conclusion on scenario 
trees is equivalent to the one in the work by Shapiro (2006) and has been widely 
noticed by practitioners. However, in the second part, we show using the same 
example that, by mixing Stochastic Programming and Dynamic Programming 
ideas, the particle method described by Carpentier et al (2009) copes with this 
numerical difficulty; the number of scenarios needed to obtain a given precision 
now does not depend on the time horizon. Unfortunately, we also observe that 
serious obstacles still arise from the system state space dimension. 



Introduction 

Consider a controlled dynamical system, affected by some exogenous noises, say 
uncertain parameters for instance. Stochastic optimal control consists in driving 
this system, having at each time step partial or total observations of those noises, 
so as to minimize some expected cost integrated through time and while satisfying 
a number of constraints. Many applications can be handled through such a model. 
For instance, think of a power producer that has to plan the use of a set of power 
plants (the control would be the production of the plants at each time step) in order 
to supply an uncertain power demand (the noises) while minimizing a production 
cost over a certain time period. We here consider discrete and finite time horizon. 
We are hence looking for feedback functions which, at each time step and for each 
possible observation of the system, provide a decision to be taken. We consider 
the laws of the random variables involved to be continuous; consequently this is an 
infinite-dimensional optimization problem. In most situations, no analytic solution 
can be found and one has to consider approximations of the original problem to be 
able to solve it numerically. 
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In the manner of the Monte Carlo approach for computing the expectation of 
a random variable, we seek a tractable approximation of the probabilistic struc- 
ture of the original problem based on samples of some noise scenarios. A scenario 
consists in a sample of the noises along the whole time horizon. Note that such 
a resolution method is stochastic in itself. Indeed, the solution given by such a 
method is a random variable in the sense that it depends on the scenarios sam- 
pled and used in the resolution. Therefore, when studying the performance of an 
algorithm based on such an approach, a variance term naturally appears, repre- 
senting the sensitivity of the solution regarding these scenarios. This variance term 
is added to the squared bias term which arises from the necessity to approximate 
the solution in feedback by a function in a predefined space. Note that it is also 
present in the performance evaluation of deterministic techniques, like Dynamic 
Programming (Bertsekas, 2000) for instance. 

For stochastic optimal control problems, it is common to represent the diffu- 
sion of "likely futures" using a scenario tree structure, leading to so-called multi- 
stage stochastic programs. This kind of representation goes back to Dantzig (1955) 
and has been widely studied within the Stochastic Programming community (see 
Prekopa, 1995; Shapiro and Ruszczynski, 2003, for a broad overview of this ap- 
proach). This methodology consists in discretizing the probabilistic structure of 
the problem, then rewriting the constraints and objective function of the origi- 
nal problem on this discrete structure and finally solving it using a well-suited 
mathematical programming technique. In other words, the problem is solved for 
a particular sample of the random variables (a scenario tree) and the solution 
is hence a random variable in itself. We have to keep this fact in mind while 
evaluating the error. The main interest of such a methodology is that it leads 
back to a deterministic problem, on which we can apply classical tools of numer- 
ical optimization (Bonnans et al, 2006). For instance, one may then try to solve 
large-scale problems using decomposition techniques (see Carpentier et al, 1995; 
Higle and Sen, 1996; Shapiro and Ruszczynski, 2003, Ch. 3). 

Despite the benefits of this methodology, it has been already observed that, in 
order to obtain a given precision, the number of scenarios needed to build the 
scenario tree has to grow exponentially with the time horizon of the problem. One 
of the aims of this paper is to highlight and to quantify this practical observation on 
an example. Thus, we start in Section 1 by introducing the Mean Squared Error, 
that is the distance between the strategy obtained using scenario trees and the 
(supposedly unique) optimal strategy. This is the indicator we use to estimate the 
performance of the method. Then, in Section 2, we study on a numerical example 
the relation linking the Mean Squared Error to the number of scenarios used for 
the discretization. Shapiro (2006) obtains similar conclusions by working on the 
performance regarding the objective function and using large deviations tools. 

We show that this negative observation is to be attributed to the scenario tree 
approach rather than to be considered as a feature of the original problem. Indeed, 
when using the particle method (Dallagi, 2007; Carpentier et al, 2009), in Section 3 
on the same example, we show that the number of scenarios needed to obtain a 
given precision does not grow when the time horizon gets longer. This methodology, 
which mixes ideas from Stochastic Programming and Dynamic Programming, is a 
variational technique which consists in writing the optimality conditions of the 
optimization problem first, and then solving them using sampling techniques. This 
is a gradient-like method that builds an adaptive mesh over the state space as 
gradient iterations progress towards the solution. This mesh aims at discretizing the 
space in the regions mostly visited by the state vector at the optimum. According 
to the results obtained on the example we present here, the algorithm seems to be 
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well-suited for multi-stage problems. However, with this approach, it seems that 
the difficulties arising from the dimension of the state space remain; this point is 
discussed in Section 4. 



1. MATHEMATICAL FORMULATION 

Consider a controlled dynamic system, affected by random variables called noises. 
We aim to find strategies that minimize some expected cost over a finite time 
horizon, while satisfying a number of constraints. We suppose that the problem 
has a unique solution and propose to compare the approximate strategies using the 
Mean Squared Error (MSE). 



1.1. The problem. We denote by T the ffiiite time horizon of the problem and 
consider discrete time {0, . . . , T}. Let (fi, P) be a probability space. Three kinds 
of random variables on this space are involved in the problem^: 

• the statc^ X — {Xt)t=o t, the values of which lie in a finite-dimensional 

vector space X; 

• the control U = {Ut)t=o,...,T-i, the values of which lie in a finite-dimen- 
sional vector space U; 

• the noise W = {W t)t=Q,...,T, the values of which lie in a finite-dimensional 
vector space W, equipped with the cr-field W and the probability Pvk , which 
is the transport of the probability P by W. 

Since some of the numerical methods we use in the following are gradient-based, it is 
natural to assume that the state and control lie in a Hilbert space. Hence we assume 
all three random variables are in L^(fi,^, P). The noise variable W is given, i.e. 
its probability law is known, while the state and control variables are optimization 
variables. We here suppose that the information structure has perfect memory: at 
each time step t, the noise Wt is observed and kept in memory, in such a way 
that the decision at time t is based on the knowledge of (Wq, . . . .Wt)- Hence 
we introduce the a-field J-t = a{Wo, . . . jWt} that represents the information 
available at time t, the associated filtration {J't)t=o,...,T, and require the control at 
time t to be measurable with respect to ■ This constraint will be written: Ut ^ J^t ■ 
At time 0, we assume that the value of the state is the realization of some random 
variable Wo- Then, at each time step t, based on the available information, a 
decision Ut is taken, a cost Ct{Xt, Ut) is incurred and the state evolves according 
to Xt+i = ft{Xt, Ut, Wt+i)- At the end of the time period, a final cost V{Xt) 
is incurred. 

The aim is to find a pair {X , U) that minimizes the expected sum of the costs 
over the time horizon while satisfying the dynamics and measurability constraints. 



Throughout this paper, random variables will be denoted by bold letters (e.g. W £ 

l,2(n, A P; IR'')- 

^In the Stochastie Optimal Control framework, the terminology "state" has a rather precise 
sense: this is the minimal information that completely sums up the past of the system in order 
to compute its future behaviour and cost function knowing all future inputs (see Powell, 2007, 
Def. 5.4.1). In that sense, the variable X can only be considered as the state variable when 
assumption 1 is made later on in §1.2. Until then, our use of this terminology is somewhat 
abusive. 
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Mathematically speaking, the problem we consider may be formulated as follows: 

(la) min E ( Ct iXt,Ut) + V {Xt)] , 

(lb) s.t. Xt+i^ ftiXt,Ut,Wt+i), Vi = 0,...,T-l, 

(Ic) Xo = Wo, 

(Id) Ut^J'u yt = o,...,T. 

Equations (lb) and (Ic) describe the dynamics of the state variable and are P-almost 
sure relations. Equation (Id) states the information structure of the problem: it 
is the measurability constraint that forces the decision to be a function of all the 
past noises, and to be independent of future noise values. Hence we refer to this 
relation as the non-anticipativity constraint. 

1.2. Mean Squared Error (MSE) for strategies. Now suppose that Prob- 
lem (1) has a unique solution denoted by {X* ^ U*). Suppose that we also have a 
numerical method, based on a scenario discretization, that gives us an approximate 
solution of (1) that we denote by {X\U^). Since this approximate solution given 
by the numerical method depends on, say, samples, the pair {X^U^) lies in 
the probability space associated with these samples, that is (W, W, Ph')®^. For 
instance, scenario tree-based methods compute values of the optimal decision for a 
finite number of state values that depend on the samples used for building the tree 
structure. 

We make the following assumption: 

Assumption 1. Random variables Wq, ■ ■ ■ , Wt are independent. 

Under this assumption, the problem lies in the framework of Dynamic Pro- 
gramming (DP). In particular, we know that optimal controls can be expressed as 
feedback functions depending only on the state variable x. In other words, there 
exists some sequence of functions 7* such that = j^{Xl). 

In order to be able to compare the optimal and approximate solutions, we need 
to produce a solution of the same nature as the optimal solution, that is a feedback 
function such as 7* out of {X\ [/'). We hence suppose that we have such an inter- 
polation/regression operator that gives us a feedback function = (r5)t=o,...,T out 
of the approximate solution {X\ U^). More details on the interpolation/regression 
operators we use are given in §2.2. Note that the approximate strategy f" = 
(r5)t=o,...,T-i is a random variable that lies in the samples probability space intro- 
duced earlier, that is (W,>V,Pw)®^. 

We are interested in evaluating the efficiency of our numerical method for solving 
Problem (1) by computing the distance between the approximate strategy and 
the optimal one 7*. Since those functions take values over the state space, we need 
to define some measure on this space. In our context, the measure that seems 
the most natural is the one corresponding to the density of the optimal state -X"*. 
Indeed, using this measure, we allocate more weight to regions which are often 
"visited" by the optimal state and we do not take into account decisions in regions 
that are never "visited" by the optimal state, since those decisions are unlikely to 
be used in practice. We denote by fi^ the density of the optimal state at time t. 

We can now introduce the MSE associated with strategy F*, which is our per- 
formance indicator in this study: 

MSE^E |xiXl|T**(^)-rS(2:)||%:(x)dxJ . 
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Note that the expectation in the MSE is taken over all possible samples; in other 
words, it lies on the samples probability space (W, W, Pvr)®''^- The MSE is thus 
the expected distance between the optimal strategy and the approximate strategy. 
Recall that this expectation is taken with respect to the drawings used by the 
algorithm to produce strategy . It is common to decompose the MSE using the so- 
called variance and squared bias. For this purpose, we introduce the expectation 7" 
of the approximate feedback function tK Then 



MSE : 



T-l 

E 

t=0 



It (x) - 7t {x) fJ.*t {x) dx 



Squared bias 



(2) 



E 



Variance 



The squared bias is the square of the distance between the optimal strategy and 
the expected approximate strategy, while the variance is the expected square of the 
distance between the approximate strategy and the expected approximate strategy. 
Both arc real values. 

Remark 1. Another relevant choice as a performance indicator would have been 
the following: 

(1) Having the approximate strategy F", build the"true" state and control ob- 
tained while simulating this strategy, that is the pair {X\ C7^) that satisfies 
the dynamics (lb) and (Ic): 



Xl ^ Wo, 



Xli = ft 



xl,ul Wt+i) , with u] ^ Tlixl), yt = 0, 



,T- 1. 



(2) Since the optimal decision U* is a random variable on (51, A, P) and the 
decision is a random variable that lies on the tensor product of (il. A, P) 
and the samples probability space (W, W, P^)^^, compute the expecta- 



tion: 



E 



U'' -U* 

where the expectation lies on the tensor product of the probability spa- 
ces in,A,f) and (W,>V,Pw)®^. Note that this quantity may also be 
t _ rr* II 2 -j^ where the expectation now lies on the sam- 



writtenE( II (7^-17 



pies probability space only. 



This quantity is another relevant performance indicator since it is positive and it 
equals zero if and only if equals U* almost surely. Moreover, we note that 
the choice of for evaluating the performance of the numerical method is the 
one made by Shapiro (2006), even though the comparison is performed using the 
objective function rather than the strategics. 

1.3. Computing the MSE. For computing the squared bias and the variance as 
defined in Equation (2), we must face two main issues. 

(1) At each time step t, we must compute integrals over the whole state space 
with respect to the density /z^. In the following example, we are able to 
explicitly compute the optimal control strategy. We can then numerically 
integrate the Fokker-Planck equation to derive the density of the optimal 
state on a sufficiently dense grid and perform the integration by quadrature 
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techniques. However, this technique suffers for the same curse of dimen- 
sionality as DP. For this reason we choose to perform the integration using 
a Quasi-Monte Carfo technique. This samphng method aims at distribut- 
ing the samples associated with density equaUy over the state space 
using so-called low-discrepancy quasi-random sequences. This provides a 
very efficient convergence speed for the numerical computation of expecta- 
tions. There exist numerous methods of Quasi-Monte Carlo sampling (see 
Niederreiter, 1992, for a survey of these methods). 
(2) We must be able to compute the expectation in the variance term. This 
expectation is taken over the samples used during the algorithm to produce 
the approximate strategy, e.g. the scenarios in the case of a scenario tree 
technique. We approximate this expectation by Monte Carlo, by performing 
the experiment a great number of times independently one from another. 
In our case, 10^ experiments were enough to precisely evaluate the variance. 

We are now able to estimate the efficiency of numerical methods using the MSE. 
We point out the relation between the error and the parameters of the methods, 
namely the number of scenarios used to discretize randomness. 

2. Scenario tree-based methods 

In the context of discrete time and finite horizon stochastic optimal control 
problems, a popular and widely used resolution technique consists in approximating 
the information structure of Problem (1) using scenario trees. This methodology 
has been widely studied by the Stochastic Programming community. We do not 
detail this method and refer to Shapiro and Ruszczynski (2003) for a more in depth 
presentation of Stochastic Programming. 

2.1. Brief presentation. Because of the measurability constraints in Problem (1), 
and since there is no reason for the noise probability densities to have finite support, 
decision variables X and U are in general infinite-dimensional. Hence, except for 
some very particular cases where an explicit solution can be found, solving the 
underlying optimization problem directly is intractable. Scenario trees provide 
a way to approximate filtrations using noise samples to produce finite support 
approximations. 

Let Af be the set of all nodes in the tree, TZ be the set of the root nodes and C 
the set of the leaves. Building a scenario tree consists in defining the following 
functions: 

• the time function 6 : A/" — >■ {0, . . . ,T} which, with every node, associates 
the corresponding time step and the corresponding multi-application 
which, to every time step t E {0, . . . , T}, associates the set of all nodes at 
time t; 

• the function v : Af\TZ J\f\C which, with each non-root node, associates 
the preceding node in the tree; 

• the weight function tt : A/" — > [0, 1] which, with every node in the tree, 
associates the probability to go through that node (the sum of all 7r(j), 
with i G S^^{t) is equal to 1, for each t); 

• the multi-application F = which, with every node i in the tree, as- 
sociates the set of its successors (we impose the convention that F{i) = 
0,V^ e C); 

• the function F'^ which, with each node z, associates the whole sub-tree 
starting from i: F+{i) = F{i) U F'^ii) U ■ • ■ U F^-^(*)(i). 

We consider a regularly branching tree, i.e. with a constant branching factor de- 
noted by nb, so that card(F(i)) = ?if,, for every node i G J\f\C At time t ~ 0, 
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we draw a nf,-sample of the random variable Wq, that we denote by (Vl^")ige-i(o) • 
For each root node i, we draw at time t ~ I a, rifc-sample of the random vari- 
able Wi^, that we denote by {W'-')j^p(^iy Samples of the random variable Wi are 
drawn independently root node per root node. This procedure continues until final 
time t = T, so that we have n^'^^ leaves to the tree. Hence one can note that this 
can reveal a heavy procedure when T becomes large, since we need to draw n'^'^^ 
samples of Wt- 

With this material we can now define a discretized version of Problem (1): 



(3a) min ^ 7r(z) ■ Ce(,) C/'^ + ^ 1^ 

(3b) s.c. X" = (x"'«,t7"^«,T¥'^), V^eA/■\7^, 

(3c) X" = W'\ Vi e n. 



The expectation in the objective function (la) has been replaced in (3a) by a dis- 
crete weighted sum of costs on the tree nodes. Equations (3b) and (3c) correspond 
to Equations (lb) and (Ic). The only constraint that seems to be missing is Equa- 
tion (Id). Actually, it is now reflected in the structure of the tree. Indeed, starting 
from any node in the tree, there is only one possibility for going back to the root 
(we know exactly the values of all past noises), but there is a whole sub-tree that 
goes to the set of leaves, corresponding to the final time step (we only know the 
laws of future noises). In other words, the non-anticipativity constraint is coded in 
the structure of the tree. 

In most cases, the scenario tree is not drawn directly. One draws a number of 
scenarios independently from one another, and then one builds a tree structure. 
Building such a tree in an efficient way (regarding the implied discretization error) 
is a challenging task. There exists a large literature on the way one can build sce- 
nario trees in a reasonable way (Pfiug, 2001; Heitsch and Romisch, 2003), as well 
as on stability of the underlying optimization problem regarding this discretiza- 
tion (Heitsch et al, 2006). We will not focus on this point here. As it has already 
been mentioned, we suppose that the tree has been built using a branching fac- 
tor rib, which does not depend on time, leading to n^^^ leaves. Recall that this 
procedure requires the drawing of iV nl+^ different scenarios. 

2.2. Case study. We apply this methodology to a simple instance where Prob- 
lems (1) and (3) can be solved analytically. Then, we apply the methodology 
described in §1.3 in order to compute the MSE and to point out its relation with 
the number of scenarios N used to discrctize the original problem. Let e be some 
positive real number, the problem we consider is the following: 



(4a) min K^^^Ul + X^y 

(4b) s.c. Xt+i= Xt + Ut + Wt+i, Vi = 0,...,T- 1, 

(4c) Xo = Wo, 

(4d) C/t ^ J-*. 



The state and control variables lie in L^(ri, A, P; M). Noise variables Wq, . . . , Wt 
are i.i.d. random variables with uniform law on [—1, 1]. One easily shows that the 



This tree building procedure, often called conditional sampling, usually requires to draw 
samples of W\ knowing the value of the preceding noises, that is Wq. Because of Assumption 1, 
this is not needed here. 
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optimal control for Problem (4) reads = 7f*(Xj) with: 

(5) ^* (^) = "T^TT7' ^"'^^^ 

We now solve the problem on the tree. On each node i, one has a noise sam- 
ple W'^ and solving the problem on the tree leads to values X'* and f/'* for the 
state and control, respectively. One easily shows that: 

1 — 0[i) + e 

We now need to build a feedback function from these state and control values. This 
can be performed using a interpolation/regression operator. We here use the sim- 
plest one, which is the nearest neighbour interpolation operator (see Gersho and Gray, 
1992, §10.4, for a rigorous study of this operator and its possible implementa- 
tions). At each time step t, we build the Voronoi diagram of the set of state 
values {X"}jgg-i(j). We denote by C'* the cell for which X" is the center. This 
cell is a random variable in itself, because it depends on the random variables drawn 
to build the scenario tree. On this cell, the feedback function will be constant, equal 
to t/". Thus we obtain the following strategy at time t: 

(6) t\{x)= J2 f^''lc'-(^)- 

ie0-i(t) 

Remark 2. The fact that the Voronoi cells are random variables, depending on 
the scenarios, makes the theoretical study of the error associated with strategy r" 
intricate. Indeed, in order to compute the expected approximate strategy 7", as well 
as to evaluate the variance term in Equation (2), we have to compute expectations 
over all possible scenario drawings, leading through Equation (6) to expectations 
over all possible Voronoi cells. This is generally a challenging task. This is the 
reason why we here appeal to numerical experiments to study the behaviour of the 
error with respect to the number of scenarios. 

The results for this approach arc presented in Figure 1 for T = 4 and a branching 
factor of nt = 3. The approximate strategy Fq for the first time step is hence a 
piecewise constant function with 3 pieces. On the right-hand side of Figure 1, we 
draw 4 samples of this strategy (dashed curves) to emphasize that the scenario-tree 
method is stochastic: each sample Fg is derived from a different scenario tree. We 
also draw the averaged strategy 70 over the 10^ scenarios (dotted curve). One can 
remark that even if sample strategies are not continuous functions, the average 
strategy seems to be continuous, even smooth. On the left-hand side of figure 1, 
we draw the exact strategy (solid curve) obtained using equation (5) and the same 
average approximate strategy as on the right-hand side (dotted curve) . 

We can observe both the variance and the bias terms of the MSE on Figure 1: 
the variance term is the average L^-distance between the dashed curves (of course 
if there were 10^ of them) and the dotted curve on the right hand-side, whereas the 
bias term is the L^-distance between the solid curve and the dotted curve on the 
left hand-side. From Equation (4c), the density of the optimal state, under which 
we compute the L^-distances cited above, equals the density of Wq, which follows 
a uniform law on [—1, 1]. 

We are now able to compute the MSE using the protocol described in §1.3, for 
several values of the branching factor. We draw in Figure 2 the evolution of both 
the squared bias and the variance with respect to the branching factor, for the 
strategy at each time step (here T = 4). For computational time reasons, we were 
not able to perform this experiment for strategies at the third and fourth time steps 
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Figure 1 . Strategics for scenario trees 



when the branching factor is more than 10 (the number of nodes involved at these 
time steps becomes too large). 




Figure 2. Squared bias and variance for scenario trees 



On this figure which uses a logarithmic scaling on both axes, one can observe 
that the variance term on the right-hand side obviously dominates the squared bias 
term on the left-hand side. Moreover, while the convergence rate for the variance 
seems to be constant with respect to time, the bias decreases with a highest rate 
as time grows. This is natural since the tree has more nodes in the last time steps 
than in the first ones to represent the approximate strategy. 

A careful inspection of the variance shows that the MSE appears experimentally 
to have a convergence rate of for every time step. In other words, the number 
of scenarios needed to obtain a given precision varies exponentially with respect 
to the time horizon: it is proportional to n^. Still, recall that in order to have 
a branching factor ni, equal to 10^, since we have a 4-period problem, we need to 
draw 10® scenarios! 



3. Particle methods 

We here propose a similar study using another numerical method based on sce- 
narios: the particle method. This variational technique consists in first writing the 
optimality conditions for the problem. Then, scenarios are used to compute the 
expectations that lie in the optimality conditions. We briefly present the idea of 
this method and then study its error with respect to the number of scenarios used 
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for sampling. A detailed description can be found in the work by Carpentier et al 
(2009). 

3.1. Brief presentation. The particle method consists in solving first-order opti- 
mality conditions that have been approximated through sampling. We first present 
the optimality conditions and then describe the sampling procedure. The resulting 
approximate optimality conditions are solved numerically using a gradient tech- 
nique. 

We suppose that the cost function and the system dynamics are differentiable 
with respect to both the state and the control, and that their derivatives are 
Lipschitz-continuous. 

We suppose that Assumption 1 is still in force. In this context, Carpentier et al 
(2009, Theorem 2.6) state that, if a solution {X* , U*) to Problem 1 exists, then, for 
every time step t = 0, . . . , T, there exists a random variable A^ G ^^(fi. A, P; X), 
called the adjoint state, such that the following conditions hold: 

(7a) XI = Wo, 

(7b) x;+, = ft{xiuiWt+^), 

(7c) A*^ = — {X*^)^, 



dx 

dCt -r^^.T . jr, f d ft 



(7d) A*^_L(x:,[/:)' +Ei^-^iXlu:,Wt+i)' A^+i 

(7e) = ^ {XI u:f + E [ 1^ {x;, uiWt+^f a;^. 



x; 
x: 



Note that these optimality conditions are specific to the Markovian case and 
are not obtained in a straightforward manner. The authors first write first-order 
optimality conditions conditionally to the filtration Ft ■ Then Assumption 1 is used 
at each time step to recursively replace the conditional expectations with respect 
to J-'t by conditional expectations with respect to the optimal state X^. 

Equations (7a) and (7b) simply recall the state dynamics of the system. Equa- 
tions (7c) and (7d) introduce a backwards relation on the adjoint states. Actually, 
the adjoint state at time t may be seen as the sensitivity of the optimal cost from 
time t to the end of the time horizon with respect to the state variable at time t. 
Hence there exists a relation between Equations (7c) and (7d) and the classical DP 
equation. Finally, Equation (7e) states that the derivative of the optimal future 
cost at a given time step with respect to the control at the same time step equals 
zero at the optimum. This is the classical first-order optimality condition when no 
additional constraints (apart for the non-anticipativity constraints) are imposed to 
the control. 

Note that the adjoint state A^^i in Equations (7d) and (7e) is, by construction, 
measurable with respect to the state variable X^^i ~ ft {X^, U^, Wt+i)- Because 
of Assumption 1, the conditional expectations in Equations (7d) and (7e) arc actu- 
ally expectations that are only supported by the random variable Wt+i- One has 
to keep this in mind when discretizing these conditions. 

We now want solve these conditions numerically using sampling techniques and 
gradient methods. Suppose we have N samples W'^ , . . . , W'^ , called scenarios or 
noise particles, for the random variable W . After the fc-th iteration, suppose we 
have N samples for the current state X''^', control U''-'^-', and adjoint state A'''-'. 
In order to compute conditions (7d) and (7e), we need to evaluate expectations of 
functions involving Aj^\, knowing the value X^*^-* of the state at time t, for every 
particle k = 1, . . . , TV. We have two ingredients to perform this operation: 
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(1) at the optimum, wc know that for every time step t = 0, ... ,T — 1, the 
adjoint state A^^i is a function of the state Xl_f_i = , C/j , VFf+i), 
where is measurable with respect to X^, and Wt+i is independent 
ofX*; 

(2) we have N samples of all the random variables involved, namely X , C/''^-' , 
A^'') and W. 

All we need is to define a regression operator, based on the current state and adjoint 
state samples, that associates with every value of the current state an estimate for 
the corresponding value of the adjoint state. At time t, we denote this regression 
operator by At. Using this material, we are now able to write the discretized 
optimality conditions, namely the initial condition on the state for every particle 
i^l,...,N: 

(8a) X'^ = W'^, 

the state dynamics for every time step t = 0, ... ,T — 1, and for every particle 
i-l,...,iV: 

(8b) x;Vi = /*(x^c7^Ty;^o, 

the final condition on the adjoint state, for every particle i = 1, . . . ,N: 
(8c) A- = ^{x-)\ 

the backwards adjoint state dynamics, for every time step t ~ T — 1, ... ,0 and 
every particle i = 1, . . . , iV: 

+ ^ (xf,C/;\Ty;^i)^ At+1 (/, {x'^,U^,Wt?)) 

and the stationarity condition, for every time step t — T—1, . . . ,0 and every particle 
i = l,...,iV: 

+ ^ (x^c/jM^;^,)^ At+i (/, {x';,utwt^) 

Note that in both equations (8d) and (8e), the sums only involve the index j, and 
hence the noise variable only. This corresponds to the fact that the conditional 
expectations in (7d) and (7e) with respect to X^ are in fact expectations. 

The algorithm then consists in solving these conditions using a gradient method. 
Thus, each iteration k of the particle method consists in three steps: 

(1) integrate the N state dynamics (8a) and (8b) using the current control 
particles t/'^'^, 

(2) integrate the N backwards adjoint state dynamics (8c) and (8d) using the 
current state and noise particles; 

(3) compute the N "gradient particles" G'^''''', . . . , G'^'^''^ (the right-hand side 
of equation (8e)) and update every control particle i using a gradient step: 
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The algorithm stops when all gradient particles are sufficiently small, ensuring that 
condition (8e) is almost fulfilled, for every i = 1, . . . , N. 

Remark 3. Note that, at each iteration, we have N samples (or particles) for the 
state and control at every time step. Using a regression operator, we are able to 
compute a feedback function such as the one introduced in Section 1.2, denoted 
by r" . This feedback function associates a decision with every possible state of the 
system and this is the quantity on which we base our error analysis. 

3.2. Case study, using the same example as in §2.2, we represent in the right- 
hand side of Figure 3 some samples of the approximate strategy (dashed curves) 
obtained by the particle method using 3* scenarios for discretization. Observe that 
these are piecewise constant functions with 3"*^ pieces, instead of only 3 pieces when 
using scenario trees. Indeed, using the particle method one has the same number 
of nodes at every time step. In other words, all the scenarios are used at each time 
step. In the left-hand side of the same figure we draw the average approximate 




state state 



Figure 3. Strategies for particle methods 

strategy (dotted curve) and the optimal strategy (solid curve). The bias term in 
the MSE is still the L^-distance between the dotted curve and the solid curve, while 
the variance term is the average of the i^-distances between the dashed curves and 
the dotted curve. Using the same number of scenarios as we did for for scenario 
trees in §2.2, we observe that both the bias and variance look much smaller to those 
observed in Figure 1. 

We can now compute the MSE for several values of the number N of scenarios 
used for discretization. We observe in figure 4 how the squared bias and the variance 
associated with each strategy, i.e. for each time step, behave when the number of 
scenarios grows. Note that the squared bias, on the left-hand side is dominated by 
the variance, on the right-hand side. A careful inspection of the latter shows that 
the term leads experimentally to a convergence rate that is slightly less than N^^. 
When comparing with the results obtained using scenario trees, one should notice 
that the cc-axes do not refer to the same quantities in Figures 2 and 4. Hence, we 
here observe a convergence rate in N~^, N being the number of scenarios, while 
we observed a convergence rate of = N^t in the case of scenario trees, which 
is much smaller. 

Unlike scenario trees, the error associated with particle methods does not depend 
on the time horizon on this example: it is the same at every time step. 

4. The state space dimension issue 

Most numerical methods that aim at solving stochastic optimal control problems 
encounter difhculties when the dimension of the state space becomes large. Since we 
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Figure 4. Squared bias and variance for particles. 



are looking for strategies, that is functions that map a decision from every possible 
state of the system, even just storing such functions becomes challenging when the 
dimension of the state space is large. In the stochastic optimal control community 
framework, this difficulty is known as the curse of dimensionality. 

Note that particle methods are not prone to the same curse of dimensionality 
as DP, for instance. While DP encounters difficulties because it requires to build 
a somewhat regular grid to cover the state space, particle methods are designed 
to construct an adaptive mesh over the state space: the state particles are built 
so as to concentrate in regions that are often visited by the optimal state. As 
such, regardless of the dimension of the state space, whenever the dispersion of the 
optimal state is small, the particle method shall produce satisfying results. 

In the example we here study, the density of the optimal state does not seem 
to have this "tightness property". On a simple extension of Problem (4) to a two- 
dimensional state space case, we observe in Figure 5 how the squared bias and the 
variance associated with each strategy, i.e. for each time step, behave when the 
number of scenarios grows. Note that the squared bias, on the left-hand side, now 
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Figure 5. Squared bias and variance for particles in dimension 2. 

prevails over the variance term, on the right-hand side. A careful inspection of the 
latter shows that the term leads experimentally to a convergence that is slightly less 
than N^'^-^. The variance itself also seems to decrease at a lower rate than in the 
one-dimensional case. Both of these observations indicate that the performance 
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of the method could benefit from a better regression operator than the nearest 
neighbour that we used here. 

Note that, for computational time reasons, we cannot produce such experimental 
results for higher dimensions. Hence we are not able to make a thorough study of 
the relation between the dimension of the state space and the convergence rate of 
both the squared bias and the variance. 

On the other hand, when dealing with large state space dimensions, a clever 
idea is to make use of decomposition schemes in order to replace the solving of 
a high dimensional problem by the iterative solving of several smaller problems. 
This is indeed a common and powerful technique when using scenario trees (see 
Carpentier et al, 1995; Higle and Sen, 1996; Shapiro and Ruszczynski, 2003, Ch. 
3) . Both the improvement of the regression operator and the use of a decomposition 
scheme are the subject of ongoing research. 

Conclusion 

We presented a numerical study concerning the comparison of two scenario- 
based approaches for solving stochastic optimal control problems with a discrete 
finite horizon. In the first section, we introduced the performance indicator that 
was used to compare both methods, namely the Mean Squared Error (MSE). 

The first approach we considered was scenario tree modeling. We observed that 
the number of scenarios needed to obtain a given accuracy grew exponentially 
with the time horizon, making the implementation for multi-stage problems hardly 
tractable. 

The second approach we considered was particle methods. In this case, we 
observed that the associated MSE docs not depend on time. Thus, this recent ap- 
proach seems to be well-suited for multi-stage problems. Unfortunately, it seems 
that its performance deteriorates rapidly when the dimension of the state space 
increases. However, this method is quite new and as such has received little at- 
tention. Its implementation and hence performance would certainly benefit from 
computational studies. Future works will be concerned with this topic. 
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